id: a unique identifier for each victim
name: the name of the victim
date: the date of the fatal shooting
manner_of_death:
shotshot and Taseredarmed: indicates that the victim was armed with some sort of implement that a police officer believed could inflict harm
undetermined: it is not known whether or not the victim had a weaponunknown: the victim was armed, but it is not known what the object wasunarmed: the victim was not armedage: the age of the victim
gender: the gender of the victim. The Post identifies victims by the gender they identify with if reports indicate that it differs from their biological sex.
M: MaleF: FemaleNone: unknownrace:
W: White, non-HispanicB: Black, non-HispanicA: AsianN: Native AmericanH: HispanicO: OtherNone: unknowncity: the municipality where the fatal shooting took place. Note that in some cases this field may contain a county name if a more specific municipality is unavailable or unknown.
state: two-letter postal code abbreviation. the state in which the incident took place.
signs of mental illness: News reports have indicated the victim had a history of mental health issues, expressed suicidal intentions or was experiencing mental distress at the time of the shooting.
threat_level: The threat_level column was used to flag incidents for the story. The general criteria for the attack label was that there was the most direct and immediate threat to life. That would include incidents where officers or others were shot at, threatened with a gun, attacked with other weapons or physical force, etc. The attack category is meant to flag the highest level of threat. The other and undetermined categories represent all remaining cases. Other includes many incidents where officers or others faced significant threats.
flee: News reports have indicated the victim was moving away from officers
FootCarNot fleeingbody_camera: News reports have indicated an officer was wearing a body camera and it may have recorded some portion of the incident.
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import numpy as np
import seaborn as sns
sns.set_theme(style="darkgrid")
import missingno as msno
import sidetable
%matplotlib inline
data=pd.read_csv('fatal-police-shootings-data.csv')
df=data.head()
df
data.info()
data.drop(['name'], axis=1, inplace=True)
data.isna().sum()
sns.heatmap(data.isna(),yticklabels=False,cbar=False,cmap='viridis')
***We can see that there are empty entires in the following columns:
duplicate=data.duplicated()
data[duplicate]
#sorting data acc. to id
data.sort_values(by=['id'])
data.drop(['id'], axis=1,inplace=True)
data.insert(0, 'id', data.index + 1)
data.head()
data['date_parsed']=pd.to_datetime(data['date'],format = "%Y-%m-%d")
data['date_parsed'].head()
data.describe(include='all')
***1. Mean age of all the people that died in police shootings is 37.
print(data.flee.value_counts(),'\n')
print(data.armed.value_counts())
data.flee.fillna('Not fleeing',inplace=True)
data.armed.fillna(data.armed.value_counts().index[0], inplace=True)
data.dropna(inplace=True) #inplace true is used to specify that to drop only those where the condition is true
age=np.array(data['age'])
age_mean=round(np.nanmean(age),0)
data.isna().sum()
print(data.stb.freq(['race']),"\n")
print(data.stb.freq(['threat_level','flee']),"\n")
print(data.stb.freq(['state'],thresh=55),"\n")
***Inference:
In the last five years:
sns.catplot(x="age",y="race",hue="gender",orient="h",row="threat_level",kind="box",data=data)
sns.distplot(age)
sns.catplot(x="threat_level",hue="signs_of_mental_illness",kind="count",data=data)